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nioni«««4 hftrtin f.s a mclhod fof RerwrBtinz a nuctcic acid library, ihe method involving: (a) providing n populaiion of wn^Cj^jtt^t^ 
nuclcifSSpS <?S IZ.^MB ^ coding sequence «nd ^ opar^Wy linic^ promocer (b) h^^^^ 

than the nuclcrc ecid template.' (c) contacting each of f^c hymon pnrfucts Of Step (b) bodj s 
DN^^v^rLS Ucfc$WsLif*^'=^ent activity and o DNA iigase under conditions m v^h.ch ihe fragments tet as pn.t,crs for 
rh^^rnS^^^f r«U^^^ complementary CO the nuclelr acid tempte; and (d) conBang the 

c^s^ W polymerase to gcnemte an RNA library. Die library being tran^rfbed from the second nucleic ^ si«nd. 
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ME TH ODS FQR GKN ERAXM^- 
HIG HLY p rV E RS E L I B RAJ( U5 S 



5 gackgi Qun d g f th^ Motion 

In general, this invention relates to methods for generatmg and 
altering recombinant libraries. 

The ability to isolate a desired nucleic acid or amino acid sequence 
requires the availabi]ic>* of recombinant ii branes of sufficient number and 
10 diversity that a particular species is represented in the library and can be 
identified by one or more screening techniques. Such libraries facilitate the 
isolation of useful compounds, including therapeutics, research diagnostics, and 
agricultural reagents, as well as their coding sequences. 

Moreover, desirable libran'ps may be specifically designed to contain 
1 5 large numbers of possible variants of a single compound. This type of library 
may be used to screen for improved versions of the compound, for example, for 
a compound variant having optimized therapeutic efficacy. 

For these or any other application, general approachea for increasing 
library diversity are vtt\' useful and represent an important focus of the protein 
20 design indtistsy. 

In general, the present invention features a method for generating a 
nucleic acid librar>', the method involving: (a) ^mmiii:^ag:^;:^^ M 

25 s@pi^©g^:^o@f^I^Ilfc^^P®5isi^^ , 
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Cffigp^:^?:^^ siE^Qrfifeaaded nucleic acid &agraents, the ftagmeate bdag 
c&sot^Sfegai tt^ tfee nucleic acid teiaplate; (c) coatacdas each of the 
hybridiza&oa produce of st^ (b) with both a DNA polymerase which lacla 
stiand displacement activity and a DNA ligasc under conditions in which fee 
5 fiagroents act as pdiaes for the completion of a second nucleic acid stnmd 
which is substantially complementary to the nucleic acid tempkte; and (d) 
ccatactiag the products of step (c) with RNA polymerase to geaesote aa RNA 
libzaiy the libja*y h&Bg tsmsaibsd from the secoad sudesc add steasd. 

In preferred embodiments, the method is used to introduce one or 
1 0 more mutations into the library; the mixture of substantiaUy complementary 
single-stranded nucleic acid fragments is generated by cleaving a double- 
stranded nucleic acid molecule; fes mixtssre of s«teteffli<&ifly coaniplemgataiy 
sksfe-stssiBdsd Quslese aeid fe^sits is generated by ^thesis of random 
fj^^^saelSoMss; tbe'^^J^^^^^saSsd mcMc acid tesaplste is generafied^fflag 
15 ^ M13 pfesge cafryiBf the nucleic acid, by digt^crai of cm of a <loiiabl&" 

ansmdsQSS, by k^ch© of b bjo&xyieSsd sassgie sudsae acid sterna tamag 



25 furifeEiiwstolpESK^rsasiQCT^ 

whiefe fe® s fesESi^ ^ ^ sssi^^^teaaded sssieHae add l^late and 



ptt>mo«gr 38 a T7 pmaoisr, the DNA polymerase is T4 DNA polyr.erase; the 
method further involves amplifying the product of step (c) prior to said 
contacting step (d); the mediod ftuther involves the step of; (e) translating the 
RNA library to generate a protein libraty; the method further involves the step 
5 of: (e) jinking to the 3' terminus of the coding sequence of each of substantially 
all of the members of the RNA library an amino acid acceptor molecule; and 
the method further involves the step of: (f) tramlating the RNA library to 
generate an RNA-protein fusion library. 

in a second aspect, die invention features a method for reducing 
10 sequence variation in a population of nucleic acid molecules, the method 
involving: (a) providing a first population of single-sfrsaded nucleic acid 
templates of vp-ying sequence, each of substantially all of the templates 
including a coding sequence and an operably linked promoter sequence; (b) 
hybridizing to the meinbets of the first population 8 second populatioa of 
1 5 sub8«aaiSiaUy complementary siagle-stranded nadeic acid fragmffiits, the 
fegsiieffits feajng ^<8sm ia hsa$sh thaa the nucleic acid template and the 
fiapaeats being of aJbstaatialiy identical sequence; (c) contacting the 
hybridization products of step (b) with both a DNA polymerase which lacks 
strand displacement acrivity and a DNA ligase under conditions in which the 
20 fragments act as primers for the completion of a second nucleic acid strand 
which is substantially complementary to the nucleic acid template; and (d) 
contacting the products of step (c) with RNA polymerase to generate a 
population of RNA molecules, the population of RNA molecules being 
transcribed from the second nucleic acid strand and having reduced sequence 
25 variation relative to the first population of single-stranded nucleic acid 
templates. 

In preferred embodiments, the method is used to s§m^ oss ox msxe 
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nwi^a^.fiOTn the fist population of single-stranded nucleic acid tossplefcas; 
st^ (^mvplvs hybridization of the fintpopukdon of singl^stranded nucleic 
acid templates to two or more different populations of substantially 
complementary single-stranded nucleic acid fragments; the second population 
5 of substantially complementary single-stranded nucleic acid fragments is 
generated by cleaving a double-stranded nucleic acid molecule; the secoad 
population of substantially complementary single-stranded nucleic acid 
fragments is generated by synthesis of random oligonucleotides; the single- 
stranded nucleic acid template is generated using an M13 phage cairyiag the 
1 0 nucleic acid, by digestion of one strand of a double-stranded nucleic acid 
template using gene VI exonuclease or lambda exonuclease, by capture of a 
biotinylated single nucleic acid strand using streptavidin, or by reverse 
transcription of RNA; step (b) is carried out using between 1 and approximately 
1000 single-stranded oucleic acid firagraents per single-stranded nucleic acid 
15 template; a single strand of the product of step (c) is used as a nucleic acid 

template and steps (b) and (c) are repeated; steps (b) and (c) are repeated, using, 
in each round, the product of step (c) as the nucleic acid template; the promoter 
is a T7 promoter; the DNA polymerase is T4 DNA polymerase; the method 
further involves amplifying the product of step (c) prior to said contacting step 
20 (d); the method further involves the step of: (e) translating the population of 
RNA molecules to generate a protein library; the method further involves the 
step of: (e) linking to the 3' terminus of the coding sequence of each of 
substantially all of the members of the population of RNA molecules an amino 
acid acceptor molecule; and the method further involves the step of: (f) 
25 translating the population of RNA molecules to generate an RNA-piolein 
fusion library. 

In a third aspect, the invention features a method for generating a 



nucleic acid library, the method involving; (a) providing a populauon of singles 
stranded nucleic add templates, each of the templates including a coding 
sequence; (b) providing a population of single-stranded nucleic acid molecules 
of sequence, the population of single-stranded nucleic acid templates 
5 and the population of single-stranded nucleic acid molecules of varying 

sequence being ^bstantially complementary; (c) hybridizing the population of 
single-stranded nucleic acid templates with the population of singJ<>8tranded 
nucleic acid molecules of varying sequence under conditions sufficient to form 
ditpleses; and (d) contacting the duplexes with one or more excision/repair 
enzymes under conditions that allow the enzymes to correct mismatched base 

pairs in the duplexes. 

In preferred embodiir.eats. the method further involves providing a 
population of single-stianded templates denved from the product of step (d) 
and repeating steps (c) and (d); and the steps (c) and (d) are repeated, using, in 
each round, a popuJarion of single-stranded templates derived from the product 
of step (d). 

In a fourth aspect, the mvention features a method for generating a 
nucleic acid library, the method involving: (a) providing a population of single- 
straaded nucleic acid templates, each of the templates including a coding 
sequence: (b) hybridiaag to the population of single-stranded nucleic acid 
temple a ras» o^ stabstantially complementary ongle-stiaaded nucleic 
add fegBHsats, the fi^ents bdng shorter in length than the nucleic acid 
tempSa^ (c) contacting each of the hybridization products of step (b) with both 
a DNA polymerase which lacks strand displacement activity and a DNA ligase 
under conditions in which the fragments act as primers for the completion of a 
second nucleic acid strand which is substantially complementary to the nucleic 
acid template; and (d) contacting the products of step (c) with one or more 
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excision/repair enzymss under conditions that allow the enzymes to correct 
mismatched base paim in the products. 

In preferred embodiments, the method further involves providing a 
population of siDgle-strsnded templates derived from the product of step (d) 
5 and repeating steps (b) - (d); and steps (b) - (d) are repeated, using, in each 
round, a population of single-stranded templates derived from the product of 
step (d). 

In preferred embodiments of the third and fourth aspects of the 
inventioa, the contacting with the excision/repair enzymes is carried out in mQ 

10 (for exsmple, in a baaerial cdl); the'contacting with the excision/repair 
enzymes is carried out in yitrQ : the single-stranded nucleic acid template is 
generated using an M13 phage carrying the nucleic acid, by digestion of one 
strand of a double-saanded nucleic acid template using gene VI exonuclease or 
lambda exonuclease, by capture . " i biotinylated single nucleic acid strand 

1 5 using streptavidin, or by reverse transcription of RNA; step (b) is carried out 
using between 1 and approximately 1000 single-stranded nucleic acid 
molecules of varying sequence or single-stranded nucleic acid fragments per 
single-stranded nucleic acid template; the method further involves the step of: 
(e) amplifying the product of step (d); each of the coding sequences is operably 

20 linked to a promoter sequence; the method further involves the step of: (e) 
transcribing the prt>ducts of step (d) to generate an RNA library; the method 
further involves the step of: (f) translating the RNA library to generate a protein 
hlTrary; the method further involves the step of: (0 linking to the 3' terminus of 
the coding sequence of each of substantially all of the members of the RNA 

25 library an amino acid acceptor molecule; and the method fiirtber involves the 
step of: (g) translating the RNA library to generate an RNA-protein fusion 
library. 



As used herein, by a ''library" is meant at least 10^ preferably, at 
least 10'°, more prefcnbly, ar least 10'^ and, most preferably, at least 10'' 
molecules having a nucleic acid and/or an amino acid component 

By a ^'mixture" of nucleic acid fragments is meant at least 100, , 
preferably, at least 500, more preferably, at least 1000, and, most preferably, at 
least 1500 nucleic acid fragments. 

By a ^'promoter sequence'* is meant any nucleic acid sequence which 
provides a functional RNA polymerase brndiug iita and which is sufficient to 
allow transcription of a proximal coding sequence. 

By "substaDtially complementary" is meant that a nucleic acid strand 
possesses a sufficient number of nucleotides which are capable of forming 
matched Watson-Crick base pairs with a second nucleic acid strand to produce 
one or more regions of double-strandedness beru-een the two nucleic acids. It 
will be understood that each nucleotide in a nucleic acid molecule need not 
form a matched Watson-Crick base pair with a nucleotide m an opposing strand 
to be substantially complementary, and that in a ^'mixture of substantially 
complementary single-stranded nucleic acid fragments," a significant fracrion 
of the fragments will contain one or more nucleotides which form mismatches 
with the "single-stranded nucleic acid template." 

By "strand displacement activity** is meant the ability of a 
polymerase or its associated helicase to disrupt base pairing between two 

nucleic acid strands. 

By 'toutarion'' is meant any nucleotide change and includes 
sequence alterations that result in phenotypic differences as well as changes 

which are silent. 

By "duplex" is meant is meant a structure fonned between two 
annealed nucleic acid strands in which sufficient sequence complementarity 
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exists between the strands to maintain a stable hybridizauon complex. A 
duplex may be either a "homoduplex," m which all of the nucleotides in the 
first strand appropriately base pair with all of the nucleorides in the second 
opposing strand, or a heteroduplex. By a '-heteroduplex" is meant a smicmre 
5 formed between two annealed strands of nucleic acid in which one or more 
nucleotides in the first strand do not or cannot appropriately base pair with one 
or more nucleotides in the second opposing complementary strand because of 
one or more mismatches. Examples of different t>pes of heteroduplexes 
include those which exhibit an exchange of one or several nucleotides, and 
1 0 inserrion or deletion mutations. 

By "random oligonucleotides" is mean: a mixrare of oligonucleotides 
having sequer-^e variation at one or more nucleotide positions. Random 
oligonucleotides may be produced using entirely random or partially random 
synthetic approaches or by intentionally alterir.g an oligonucleotide in a 

15 directed fashion. 

By an "amino acid acceptor molecule" is meant any molecule 
capable of being added to the C-terminus of a growing protein chain by the 
catalytic activity of the ribosomal peptidyl transferase .ftinction. Typically, 
such molecules conuin (i) a nucleotide or nucleotide-like moiety (for example, 

20 adenosine or an adenosine analog (di-methyladon at the N-6 amino position is 
acceptable)), (ii) an amino acid or amino acid-like moiety (for example, any of 
the 20 D- or L-amino acids or any ammo acid analog thereof (for example, O- 
raethyl tyrosine or any of the analogs described by Ellman e: al., Metb. 
Eozymol. 202:301, 1991)). and (iii) a linkage between the two (for example, an 

25 ester, amide, or ketone linkage at the 3' position or, less preferably, the 2' 

posirion); preferably, this linkage does not signitlcantly perturb the pucker of 
the ring from the namral ribonucleotide conformarion. Amino acid acceptors 
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may also possess a nucleophile, which may be, without limitation, an amino 
group, a hydroxyl group, or a sulfhydryl group. In addition, amino acid 
acceptors may be composed of nucleotide raimetics, amino acid mdmetics, or 
mimetics of a combined nucleotide-amino acid structure. 
5 By an amino acid acceptor being linked "fo the 3' teiininus" of a 

coding sequence is meant that the amino acid acceptor molecule is positioned 
after the final codon of that coding sequence. This tenn includes, without 
limitation, an amino acid acceptor m^olecuJe that is positioned precisely at the 3' 
end of the coding sequence as well as one which is separated from the final 
10 codon by intervening coding or non-coding sequence (for example, a sequence 
corresponding to a pause site). This term also includes constructs in which 
coding or non-coding sequences follow (that is. are 3' to) the amino acid 
acceptor molecule. In addition, this term encompasses, without limitation, an 
' amino acid acceptor molecule that is covalently bonded (either directly or 
1 5 indirectly through inter^^ening nucleic acid sequence) to the coding sequence, 
as well as one that is joined to the coding sequence by some non-covalcnt 
means, for example, through hybridization using a second nucleic acid 
sequence that binds at or near the 3' end of the coding sequence and that itself is 
bound to an amino acid acceptor molecule. 
20 By an *'RNA-protein" fusion is meant any molecule that includes a 

ribonucleic acid covalently bonded through an amide bond to a protein. This 
covalent bond is resistant to cleavage by a ribosome. 

By a "protein" is meant any two or more naturally occurring or 
modified amino acids joined by one or more peptide bonds. 'Trotein," 
25 "peptide/' and "polypeptide" are used interchangeably herein. 

By population of single-stranded templates of varying sequence" 
is meant that the nucleic acid species of the population possess sequences 
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which differ at one or more nucleotide positions. 

By "excision/i Jpair enzymes" is meant any combination of enzymes 
sufficient to replace a mismatched base pair or loop with a standard base pair 
(i.e., A:T or G:C). 

Brief Description of the Drawing^ 
FIGURE 1 is a schematic representation of an exemplary fragment 
recombination method for generating highly diverse RNA-protein fusion 
libraries. In this method, the fragments are derived from a double-stranded 
DNA molecule into which sequence variation is introduced. 

FIGURE 2 is a schematic representation of the initiaJ steps of a 
second exemplary fragment recombination method. In this method, the 
fragments are synthetic oligonucleotides ^nto --^'hich sequence variation is 
iutroduced. 

Detailed Description 
The present invention involves a number of novel and related 
methods for the random recombination of nucieic acid sequences, facilitating 
the generation of DNA, RNA, and protein libraries into which genetic 
alterations have been introduced. As described in more detail below, in one 
preferred embodiment, this technique is carried out in vitro and is used to 
generate traditional protein libraries or RNA-protein fusion libraries, either of 
which may then be used in combination with any of a variety of methods for 
the selection of desired proteins or peptides (or their corresponding coding 
sequences) from library populations. This general approach provides a means 
for the introduction of mutations into protein libraries in an unbiased fashion 
and also provides a technique by which unfavorable mutations may be removed 



&om a Ubrary or selected pool, or "backcrossed" out of a population of 
molecules during subsequent rounds of selection. 



According to one preferred method of the invention, a libxary is 
generated by the production of mutant ftagments and the random 
reannbination of these feagments with an unmutated (typicaUy, wild-type) 
seqiience. One example of this general approach is show-n in Figure 1. As 
indicated in this figure, mutations are first randomly introduced into an initial 
double-stranded DKA sequence (termed "dsDNA(init)"). This produces a 
population of mutant double-stranded DNA sequences, which, in Figure 1, is 
termed "dsDNA(mut)."* These mutations may be introduced by any technique, 
including PGR mutagenesis (which relies on the poor enor-prooftng 
mechanism of Taq polymerase), site-directed .nuragenesis, or template-directed 
mutagenesis (for example, as described in Joyce and Inoue, Nucl. Acids Res, 
17:171, 1989. The DNA in this mutation-containing population is subsequently 
fragmented using an\' of a variety of standard methods. For example, the DNA 
may be partially degraded using one or more nucleases (such as DNase I, 
micrococcal nuclease, restriction endonucleases, or PI nuclease), or may be 
fragmented chemically using, for example, Fe«EDTA. Alternatively, mutation- 
containing fragraenis may be generated by limited nucleotide consumption 
during polymerizarion (for example, during PGR amplification), or by simple 
physical shearing (for example, by sonication). Preferable fragment sizes range 
from 25-1000 base pairs and are most preferably in the range of 50-150 base 
pairs. 

The DNA fragments are then healed and subsequently annealed to a 
full-length single-stranded DNA template which is identical to the initial DNA 



in sequence and which i." the non-coding (or minus) strand of that DNA. In 
addition, in 'his hybridization mixrur. is included a second type of fragment, 
sometimes referred to as a '^tetminator fragment" (Joyce and Inoue, Nucl. 
Acids Res. 17:171, 1989). This tennioator fragment is complementary to the 3' 
5 end of the smgle-stranded template and provides a polymerization primer 
which binds to the template in a mamier that is relatively independent of the 
number or nature of the randomly annealed, mutation-containing fragments. 

Single-stranded templates may be generated by any standard 
technique, for example, by using an M13 phage carrying the DNA sequence, by 
10 digestion of the coding strand of a dsDNA(init) molecule using gene VI 
exonuclease (Nikiforov et al, PCR Methods Appl. 3:285, 1994) or lambda 
exonuclease (Higuc'ru and Ochman. Nucl. Acid. Res 17: 5365. 1989), by 
capture of a biotinylated DNA sn-and using imrr.obilized streptavidin (Joyce 
and Inoue. Nucl. Acidi Res. 17:171. 1989), or by reverse transcription of RNA. 
15 To carry out the template-fragment hybnd.zation, templates are mixed with 
fragments using no less than one fragment molecule per template molecule and 
no more than approximately 1 000 fragment molecules per template molecule. 
A low ratio of fragments to templates produces progeny strands that closely 
resemble the templates, whereas a higher ratio produces progeny that more 
20 closely resemble the fragments. Hybridization conditions are detennmed by 
standard methods and are designed to allow for the formation of heteroduplexes 
between the template and the fragments. Exemplary hybridization techniques 
are described, for example, in Stemmer. U.S. Patent No. 5,605.793. 

Once eaneal^ to the template, the fragraeafe are joiaed together by 
25 treating with both a DNA polymerase that lacks straad displ^ement activity 
and a DNA Hfese. DNA polymerases useful for this purpose include, without 
limitation, T4 DNA polymerase and reconstiwted DNA pol II from L £Sli (see, 
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for example. Hughes et al„ J. Biol. Chem. 266:4568. 1991). A^y DNA Ugase 
(for example. T4 DNA lig^^'e) may be utilized. In this step, the DNA duplexes 
may be treated first with the DNA polymerase and then with the DNA ligase, 
or with both enzymes simultaneously, and the step may be carried out. for . 
example, as described in Joyce and Inoue (Nucl. Acids Res. 17:711, 1989). As 
shown in Figure 1, this step generates a population of double-stranded DNAs 
(tetmed "dsDNA(lib)), each member of which includes one strand typically 
having one or more introduced mutations. Because both the mutations initiaiiy 
introduced and the number and nature of the fragments amiealed are random, 
different duplexes in the population contain different mutant sequences. 

An alternative to this general approach for generating a double- 
stesaded DNA library is shown in Figure 2. By this alternative approach, 
single-stranded oligonucleotide fragments are synthesized which correspond to 
portions of the coding strand of an initial double-stranded DNA molecule. 
These oligonucleotide fragments preferably range from 5-2000 nucleotides, and 
most preferably range from 20-100 nucleotides in length and are generated, for 
example, using any standard technique of nucleic acid synthesis. These 
oligonucleotides may be synthesized with completely random or semi-random 
mutations by any standard technique. Preferably, such oligonucleotides include 
up to 3 introduced mismatches per 20 nucleoride segment and are devoid of in 
frame stop codons. In addition, in certain cases, it may be desirable or 
necessary to increase the hybridization potential of the oligonucleotide through 
the introduction of non-natuial, affinity-enhancing base pairs, such as C-5 
propyne uridine or C-5 propyne cytidine. These techniques are described, for 
example, in Wagner ei al., Science 260:1510, 1993. 

These tsiutesion-conta Jning oligomicleotick fiagsassHfe ^ aest 
amiealed to aingle-stiaaded templates which, as above, are fiiH-lengiSi strands 
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identical in sequence to the non-coding (or minus) strand of the irucial DMA. 
The fragments are joined tocether using DNA polymerase and DNA Ugase, 
also as described above, to create a double-stranded DNA library (dsDNA(lib))- 
Again, this library contains a population of duplex molecules, containing an 
5 array of different coding strands having mutations which dififer in number, 

position, and identity. 

If desired, the above steps may be repeated, for either the fragment or 
the oligonucleotide approach, to introduce varying numbers of mut-tions into a 
DNA molecule. In particular, the mutated strands become the initial single- 
10 stranded templates, and mutant fragments or oligonucleotides are anneaJed to 
those strands and poly-merized and ligated. 

As generally described above, the methods of the invention are used 
to introduce mutatioos into an initial DNA sequence. In addition, these 

1 5 techniques may be used to remove or reduce in frequency undesirable 
mutations from a DNA library. According to this approach, following 
fragmentation of the dsDNA(mut), oligonucleotides of wild-type sequence or 
specific fragments of unmutated or wild-type DNA (wtDNA) may be added to 
the single-stranded template together with the dsDNA(mut) fragments or 

20 oligonucleotides. The fragments are strand-separated (if necessary), annealed 
to the full-length single-stranded template, and joined together using DNA 
polymerase and DNA ligase, as described above. The use of a high 
concentration of unmutated oligonucleotide or fragment, relative to the 
conesponding mutant fragment, allows for the generation of libraries in which 

25 undesirable mutations are minimized or eliminated. 

In addition, this approach may be used with existing mutation- 
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containing libraries to similarly decrease or eliminate undesirable sequences. 
This approach involves an initial library having mutant sequences and the 
annealing, poIyTDerizadon, and ligatio.i of fragments or oligonucleotides of 
wild-type sequence, as generally described above. 

Id one preferred cmbodiraent of the inveation, the DNA Kbraries 
described above fiirth^ include an RNA polymemse bindhig site, for example, 
for T7 or SP6 polynteipase. Such binding sites are described, for example, in 
Milligan et al., Proc. Nath Acad. Sci. USA 87:696, 1990. This site is 
10 positioned upstream of the coding sequence at a iocarion which allows for 
transcription of the sequence. Typically such sites are located at between 5- 
2000 base pairs upstream of the coding sequence. 

Libraries containing RNA polymerase binding sites may be altered 
as described above. Following polymerization and ligation, the dsDNA(Iib) 
1 5 may be transcribed directly, for example, using an ia yj^ transcription system, 
to generate an RNA library. Alternatively, the dsDNA(lib) may be transcribed 
and translated direcd) , for example, using in Yito transcription and translation 
systems, to generate a protein hbrary. Exemplary in vitro transcription systems 
and in vitro translation systems include T7 transcription systems, and rabbit 
20 reticulocyte, wheat g^m, yeast, and fifiU translation systems. 

If desired, the number of copies of each RNA or protein in the library 
may be increased by including a strand-specific amplification step prior to 
transcription. For example, PCR amplification may be carried out by 
incorporating unique primer-binding sequences are incorporated into the 
25 mutant strand during the polymerization and ligation steps. These sequences 
may be incorporated as either mismatches or sequence extensions at one or 



-16- 



both ends of the DNA, allowing amplificaUon of the newly-synthesi2ed strand 
without amplification of the template strand. Alternatively, linear amplification 
can be achieved by multiple cycles of annealing and extension of a single 
oUgonucleotide primer that is complementary- to the 3' end of the 
5 newly-synthesized strand. Subsequent PCR and transciipnon steps produce a 
majority of RNA corresponding to mutant sequences with only a smalJ 
proportion of template-derived sequences. 

In one preferred approach, the above methods for introducing 
mutations or for backcrossing out undesirable mutations may be used to 
10 produce highly diverse RNA-protein libranes. Such libraries may be 
constructed by ligattng linkers containing a nor. hydrolyzable amino acid 
acceptor molecule, such as puromycin, to the 3 termini of the RNAs in a 
library (for example, produced as described above). Exemplary techniques for 
generating RNA-protein fusions are described. :or example, in Szostak et al., 
1 5 U.S.S.N. 09/007,005; and Roberts et al., Proc. Natl. Acad. Sci. USA 94:12297, 
1997. Subsequent translation of these RNAs generates a library of RNA- 
protein fusion molecules that may subsequently be used in in vififi selection 
experiments. 

In addition, if desired, RNA or RNA-protein fijsion molecules, once 
20 selected, may be used as templates in standard PCR reactions to obtain the 
corresponding coding sequence. Thus, this method provides a means for 
carrying out fragment recombination, molecular backcrossing, selection of 
proteins and/or peptides, and selection of their corresponding coding 
sequences, all in an is YillQ system. 



25 



In addstioa to fragment iccombination approaches, eacision/i^ 
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may also be used to alter library sequences. This approach may be used to 
generate DNA, RNA, aad RNA-protein fiision libraries. This technique relies 
on the fact that the dsDNA(Iib)s, produced by any of the methods described 
above, by their nature, contain a certain number of mismatched base pairs. To 
generate diversity in the library sequences, these mismatches are repaired in 
iiJSEQ by excision/repair enzymes. This may be camed out using any excision 
repair system (for example, as described in Jaiswal et al., Nucl. Acids Res. 
26:2184, 1998; orFortmi etal., Biochemistry 37:3575, 1998. 

Alteraanvely, the excision/repair step may be carried out by 
transforming a dsDNA(lib) into a bactenal or ysast strain and exploiting the 
bacterial or yeast repair systems in vjvQ. Agaio, this step may be carried out by 
transformmg the library into any standard in vivQ axcisioo/repair system. 
Exemplary systems are described, without limitation, in Campbell et al., Mutar. 
Res. 211:181, 1989; Bishop and K' lodner, Mol. Cell Biol. 6:3401, 1986; Fishel 
etal., J. Mol. Biol. 188:147, 1986; and Westmoreland et al., Genetics 145:29, 
1997, 

Because the above repair processes are random, this excisioo/repair 
mediod sometimes results in the introduction of mutations into a library 
sequence and at other times results in the backcrossing of wild-type sequence 
alterations into the coding strand. 

In an alternative to the above approaches, in yjixa or in yivQ 
excision/repair may also be used directly to generate diverse libraries using as a 
substrate a mixture of dsDNA(mut) (for example, produced as described above) 
and dsDNA(init) or wtDNA. In this technique, the mature is strand-separated 
and reannealed, and is then either incubated in jdto with excision/repair 
enzymes or transformed into bacteria to utilize the bacterial excision/repair 
system (for example, as described above). In this manner, mutations may be 



randomly introduced iBio a sequence, and wild-type sequences may be 
backcTOSsed into dsD? :A(mut) molecules. 



What is claimed is; 
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1 . A method for generating a nucleic acid library, said method 
comprising: 

(a) providing a population of single-stranded nucleic acid templates, 
each of said templates comprising a coding sequence and an operably linked 
promoter sequence; 

(b) hybridizing to said population of single-stranded n ucieic acid 
templates a mixture of substantially complementary single-stranded nucleic 
acid fragments, said fragments being shorter in length than said nucleic acid 
template; 

(c) contacting each of the hybridizarion products of step (b) with 
both a DNA polymerase which lacks strand displacement activity and a DNA 
ligase under conditions in which said fragments act as primers for the 
completion of a second nucleic acid strand which is substantially 
complementary to said nucleic acid template; and 

(d) contacdng the products of step (c) with RNA polymerase to 
generate an RNA library, said library being transcribed from said second 
nucleic acid strand. 

2. The method of claim 1, wherein said method is used to introduce 
one or more mutations into said library. 

3. A method for reducing sequence variation in a population of 
nucleic acid molecules, said method comprising: 

(a) providing a first population of single-stranded nucleic acid 
templates of varying sequence, each of substantially all of said templates 
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comprising a codiDg sequence and an operably linked promoter sequence; 

(b) hybndiriBg to the members of said tirst population a second 
population of substantially complementai>- single-stranded nucleic acid 
fragments, said fragments being shorter in length than said nucleic acid . 
template and said fragments being of substantiaUy identical sequence; 

(c) contacting the hybridization products of step (b) with both a DNA 
polymerase which lacks strand displacement activit)' and a DNA ligase under 
conditions in which said fragments act a? primers for the completion of a 
second nucleic acid strand which is substantially complementary to said nucleic 

acid template; and 

(d) contacnag the products of step (c . 'viuh RNA polymerase to 
generate a population of RNA molecules, said population of RNA molecules 
being transcribed from said second nucleic acid strand and having reduced 
sequence variation relative to said first population of single-strandec^ nucleic 
acid templates. 

4. The method of claim 3, wherein said method is used to remove 
one or more mutations from said fiist population of single-Stranded nucleic acid 
templates. 

5 . The method of claim 3, wherein step (b) comprises hybridization 
of said first population of single-stranded nucleic acid templates to two or more 
different populations of substantially complementary single-stranded nucleic 
acid fragments. 



6. The method of claim 1 or 3, wherein said mixture of substantially 
iplemcntary single-stranded nucleic acid fragments is generated by cleaving 



a double-Stranded nucleic acid molecule or by synthesis of random 
oligonucleotides. 

7. The method of claim I or 3, wherein said single-stranded nucleic 
acid template is generated using an Ml 3 phage canying said nucleic acid, by 
5 digesrion of one strand of a double-stranded nucleic acid template using gene 
VI exonuclease or lambda exonuclea^e, by capture of a biotinylated single 
nucleic acid strand using streptavidin, or by reverse transcription of RNA. 

8. The method of claim i or 3, wherein said mixture of substantially 
complementary single-stranded nucleic acid fragments comprises at least about 

0 iOO different species of nucleic acid ftagmenu. 

9. The method of claim 1 or 3, wherein step (b) is carried out using 
between 1 and approximately 1000 fragments per single-stranded nucleic acid 
template, 

10. The method of claim 1 or 3, wherein a single strand of the 

5 product of step (c) is used as a nucleic acid template and steps (b) and (c) are 
repeated. 

1 1 . The method of claim 10, wherein said steps (b) and (c) are 
repeated, using, in each round, the product of step (c) as said nucleic acid 
template, 

12. The method of claim 1, wherein said method further comprises 
providing one or more single-stranded nucleic acid fragments which form a 
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homoduplex with said single-stranded nucleic acid template md carrying out 
seep (b) in the presence of said homoduplsx-forming fragments. 

13. The method of claim 1 or 3, v^'hsrein said promoter is a T7 

promoter. 

14. The method of claim 1 or 3, wherein saidDNA poIymerB^e is T4 
DNA polymerase. 

15. The method of clainT 1 or 3, wherein said method further 
comprises amplifying said product of step (c) prior to said contacting step (d). 

16. The method of claim 1 or 3, -^herein said method further 
comprises the step of: 

(e) cranslating said RNA library to generate a protein library. 

1 7. The meihod of claim 1 or 3, wherein said method further 

comprises the step of: 

(e) linking ro the 3' terminus of said coding sequence of each of 
substantially all of the members of said RNA library an amino acid acceptor 
molecule. 

18. The method of claim 1 7, wherein said method further comprises 
the step of: 

(0 translating said RNA library to generate an RNA-protetn fusion 

library. 



19. A method for generating a nucleic acid library, said method 
comprising: 

(a) providu32 a population of single-stranded nucleic acid templates, 
each of said template.- comprising a coding sequence; 
5 (b) providing a popuJahon of single-stranded nucleic acid molecules 

of varying sequence, said population of single-stranded nucleic acid templates 
and said population of single-stranded nucleic acid molecules of varying 
sequence being subscancially complfimentar^'; 

(c) hybridizing said population of single-stranded nucleic acid 
10 templates with said population of single-stranded nucleic acid molecules of 

varying sequence under conditions sufficient to form duplexes; and 

(d) contacdng said duplexes with one or more excision/repair 
enzymes under conditions that allov/ said en2>'mes to correct jiismatched base 
pairs in said duplexes. 



15 20. The method of claim 1 9, wherein said method further comprises 

providing a population of single-stranded templates derived from the product of 
step (d) and repeating steps (c) and (d). 

2 1 . The method of claim 20, wherein said steps (c) and (d) are 
repeated, using, in each round, a population of single-stranded templates 

20 derived from the product of step (d). 

22. A method for generating a nucleic acid library, said method 
comprising: 

(a) providing a population of single-stranded nucleic acid templates, 
each of said templates comprising a coding sequence; 
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(b) hybridizing to said popularion o: single-stranded nucleic acid 
templates a mixture of substantially complementary single-stranded nucleic 
acid fragments, said fragments being shoner in length than said nucleic acid 
template; 

(c) contacting each of the hybridization products of step (b) with 
both a DNA polymerase which lacks strand displacement activity and a DNA 
ligase under conditions in which said fragments act as primers for the 
completion of a second nucleic acid strand which is substantially 
complementary to said nucleic acid template; and 

(d) contacting the products of step (c) with one or more 
excisiorL/'repair enzymes under conditions that allow said enzs'mes to correct 
mismatched "jase pairs in said products. 

23. The method of claim 22, whereir. said method further comprises 
providing a population of single-scranded templates derived from the product of 
step (d) and repeating steps (b) - (d). 

24. The method of claim 23. wherem said steps (b) - (d) are 
repeated, using, in each round, a population of single-stranded templates 
derived from the product of step (d). 

25. The mediod of claim 19 or 22, wherein said contacting with said 
excision/repair enzymes is carried out in mo.^ 

26. The method of claim 25, wherein said contacting with said 
excision/repair enzymes is carried out in a bacterial cell. 



27. The method of claim 19 or 22, wherein said contacting with said 
excision/repair eruymes is carried out in vii^Q. 

28. The method of claim 19^ or 22, wherein said single-stranded 
nucleic acid template is generated using an MI3 phage carrying said nucleic 
acid, by digestion of one strand of a double-stranded nucleic acid template 
using gene VI exonuclease or lambda exonuclease, by capture of a biotinyjated 
sixigle nucleic acid strand using streptavidin, or by reverse rranscription of 
RNA. 

29- The ruethod of claim 19 or 22. wherein step (1:)) is carried out 
using between 1 and approximately 1000 single-stranded nucleic acid 
molecules of var}nng sequence or single-stranded nucleic acid fragments per 
single-Stranded nucleic acid template. 

30. The method of claim 19 or 22, wherein said method fiirther 
comprises the step of: 

(e) amplifying said product of step (dj. 

3 1 . The method of claim 19 or 22, wherein each of said coding 
sequences is operably linked to a promoter sequence. 

32. The method of claim 31, wherein said method further comprises 
the step of: 

(e) transcribing the products of step (d) to generate an RNA library. 

33. The method of claim 32, wherein said method further comprises 



the step of: 

•(f) translating Said RNA library to generate a protein library. 

34. The method of claim 32, wherein said method further comprises 
the step of: 

(0 linking to the 3* terminus of said coding sequence of each of 
substantially all of the members of said RNA library an amino acid acceptor 
molecule. 

35. The method of claim 34, wherein said method fi:rther comprises 
the step of: 

(g) tranilaring said RNA library to generate an RNA-protein fusion 

library. 



. . 
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